Skip to content

[DWARF] Speedup .gdb_index dumping #151806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 7, 2025

Conversation

itrofimow
Copy link
Contributor

This patch drastically speed ups dumping .gdb_index for large indexes

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2025

@llvm/pr-subscribers-debuginfo

Author: None (itrofimow)

Changes

This patch drastically speed ups dumping .gdb_index for large indexes


Full diff: https://github.com/llvm/llvm-project/pull/151806.diff

1 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp (+20-5)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
index 987e63963a068..c0ad2a38df373 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
@@ -17,6 +17,7 @@
 #include <cinttypes>
 #include <cstdint>
 #include <set>
+#include <unordered_map>
 #include <utility>
 
 using namespace llvm;
@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
                ", filled slots:",
                SymbolTableOffset, (uint64_t)SymbolTable.size())
      << '\n';
+
+  std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator>
+      CuVectorMap{};
+  CuVectorMap.reserve(ConstantPoolVectors.size());
+  const auto FindCuVector =
+      [&CuVectorMap, notFound = ConstantPoolVectors.end()](uint32_t vecOffset) {
+        const auto it = CuVectorMap.find(vecOffset);
+        if (it != CuVectorMap.end()) {
+          return it->second;
+        }
+
+        return notFound;
+      };
+  for (auto it = ConstantPoolVectors.begin(); it != ConstantPoolVectors.end();
+       ++it) {
+    CuVectorMap.emplace(it->first, it);
+  }
+
   uint32_t I = -1;
   for (const SymTableEntry &E : SymbolTable) {
     ++I;
@@ -72,11 +91,7 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
     StringRef Name = ConstantPoolStrings.substr(
         ConstantPoolOffset - StringPoolOffset + E.NameOffset);
 
-    auto CuVector = llvm::find_if(
-        ConstantPoolVectors,
-        [&](const std::pair<uint32_t, SmallVector<uint32_t, 0>> &V) {
-          return V.first == E.VecOffset;
-        });
+    auto CuVector = FindCuVector(E.VecOffset);
     assert(CuVector != ConstantPoolVectors.end() && "Invalid symbol table");
     uint32_t CuVectorId = CuVector - ConstantPoolVectors.begin();
     OS << format("      String name: %s, CU vector index: %d\n", Name.data(),

@itrofimow
Copy link
Contributor Author

I have a binary with gdb-index of size ~250Mb, and for that binary llvm-dwarfdump --gdb-index takes basically forever (10+ minutes) to complete.
With the patch applied it takes ~5s

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement!

Hmm, actually at a high level: I guess this ConstantPoolVectors isn't sorted, is it? So we can't do a binary search... could we sort it? I guess not - since we do want to dump it in a way that matches the input too (in case the on-disk ordering is important to debugging the data at some point)?

(oh, and high level question, if you're interested/able: What's your interest in gdb_index? Myself, I've worked on various indexing solutions at Google due to the large size of single binaries we have, for a while but we rarely see traction/interest in these tools outside of Google - so it's always interesting to make friends with folks who are facing similar problems)

Comment on lines 78 to 80
++it) {
CuVectorMap.emplace(it->first, it);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could drop the {} here.

Could use a range-based for loop, and instead of putting iterators as values in the map, use pointers (then you can get a pointer to the value in the range based for loop where there aren't any visible/name-able iterators)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rearranged the code a bit to just hold the ids in the map, instead of calculating these ids later

Copy link

github-actions bot commented Aug 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@itrofimow
Copy link
Contributor Author

Thanks for the review!


Hmm, actually at a high level: I guess this ConstantPoolVectors isn't sorted, is it? So we can't do a binary search... could we sort it? I guess not - since we do want to dump it in a way that matches the input too (in case the on-disk ordering is important to debugging the data at some point)?

I think it is sorted by construction, but given that we look for exact match, for big enough vectors it likely would still be faster to put offset->id in a hash map rather than do a binary search.

oh, and high level question, if you're interested/able: What's your interest in gdb_index?

Actually, I'm not particularly interested in gdb_index itself, although it's always nice to learn how smart people organize their data; I've encountered this inefficiency when debugging BOLT generating gdb-index in ways that break gdb (#151857, #151861).

We at Yandex are also dealing with somewhat large binaries, and even though our toolchain is mostly llvm, gdb is the go-to debugger for probably historical reasons, and there is a lot of infrastructure built on top of it here and there (coredump analysis, some custom poor man's profilers, pretty-printers etc.). At Perforator we are working to fully support generating BOLT-profiles, and thereafter incorporate BOLT into release pipelines; these BOLT issues manifested themselves, and I went digging

Myself, I've worked on various indexing solutions at Google due to the large size of single binaries we have

Although I'm not particularly interested in gdb_index, I am very much interested into another index format: GSYM. Given your vast experience with different indexes, maybe you know places/folks to ask questions/request reviews about it?

@itrofimow
Copy link
Contributor Author

CI failure is the CodeGen/AArch64/midpoint-int.ll test, which seems completely unrelated to me

@itrofimow itrofimow requested a review from dwblaikie August 6, 2025 12:52
@dwblaikie
Copy link
Collaborator

Myself, I've worked on various indexing solutions at Google due to the large size of single binaries we have

Although I'm not particularly interested in gdb_index, I am very much interested into another index format: GSYM. Given your vast experience with different indexes, maybe you know places/folks to ask questions/request reviews about it?

Yeah, that's definitely @clayborg's wheelhouse.

(thanks for the other context on your use cases)

Hmm, actually at a high level: I guess this ConstantPoolVectors isn't sorted, is it? So we can't do a binary search... could we sort it? I guess not - since we do want to dump it in a way that matches the input too (in case the on-disk ordering is important to debugging the data at some point)?

I think it is sorted by construction, but given that we look for exact match, for big enough vectors it likely would still be faster to put offset->id in a hash map rather than do a binary search.

Hmm - is it? It looked like teh offsets were read in from the file in the SymTableSize loop parseImpl - doesn't look like that's necessarily ordered...

@itrofimow
Copy link
Contributor Author

Thanks for the directions! I'll reach out to Greg about GSYM


Hmm - is it? It looked like teh offsets were read in from the file in the SymTableSize loop parseImpl - doesn't look like that's necessarily ordered...

The CUOffsets are read in here into a set

std::set<uint32_t> CUOffsets;
for (uint32_t i = 0; i < SymTableSize; ++i) {
uint32_t NameOffset = Data.getU32(&Offset);
uint32_t CuVecOffset = Data.getU32(&Offset);
SymbolTable.push_back({NameOffset, CuVecOffset});
if (NameOffset || CuVecOffset)
CUOffsets.insert(CuVecOffset);
}

and later used to populate ConstantPoolVectors like this
for (auto CUOffset : CUOffsets) {
Offset = ConstantPoolOffset + CUOffset;
ConstantPoolVectors.emplace_back(0, SmallVector<uint32_t, 0>());
auto &Vec = ConstantPoolVectors.back();
Vec.first = Offset - ConstantPoolOffset;

so I think they are indeed ordered, but I still would prefer a hash-map here :)

@itrofimow
Copy link
Contributor Author

The CI is green after rebase; could you @dwblaikie please merge this PR for me? - I don't have write access

@dwblaikie
Copy link
Collaborator

Ah, right - didn't notice it was reading into a set.

For argument's sake - could you check the performance with a binary search? See if it's acceptable for your use case. Be nice not to have to build another data structure.

@itrofimow
Copy link
Contributor Author

For argument's sake - could you check the performance with a binary search? See if it's acceptable for your use case. Be nice not to have to build another data structure.

With the hashtable user-time averages to 3.73s
With the binary search user-time averages to 4.21s

Both are perfectly fine for me, your call

@dwblaikie
Copy link
Collaborator

Really appreciate your willingness to give it a go. And that kind of tradeoff is prefer the binary search - especially if the implementation comes out pretty tidy/small - hopefully can use llvm:: binary_search or similar

@dwblaikie dwblaikie merged commit 069bf18 into llvm:main Aug 7, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants